Goto

Collaborating Authors

 multi-domain evaluation


What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation

Neural Information Processing Systems

While semantic segmentation has seen tremendous improvements in the past, there are still significant labeling efforts necessary and the problem of limited generalization to classes that have not been present during training. To address this problem, zero-shot semantic segmentation makes use of large self-supervised vision-language models, allowing zero-shot transfer to unseen classes. In this work, we build a benchmark for Multi-domain Evaluation of Zero-Shot Semantic Segmentation (MESS), which allows a holistic analysis of performance across a wide range of domain-specific datasets such as medicine, engineering, earth monitoring, biology, and agriculture. To do this, we reviewed 120 datasets, developed a taxonomy, and classified the datasets according to the developed taxonomy. We select a representative subset consisting of 22 datasets and propose it as the MESS benchmark. We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models.


What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation

Neural Information Processing Systems

While semantic segmentation has seen tremendous improvements in the past, there are still significant labeling efforts necessary and the problem of limited generalization to classes that have not been present during training. To address this problem, zero-shot semantic segmentation makes use of large self-supervised vision-language models, allowing zero-shot transfer to unseen classes. In this work, we build a benchmark for Multi-domain Evaluation of Zero-Shot Semantic Segmentation (MESS), which allows a holistic analysis of performance across a wide range of domain-specific datasets such as medicine, engineering, earth monitoring, biology, and agriculture. To do this, we reviewed 120 datasets, developed a taxonomy, and classified the datasets according to the developed taxonomy. We select a representative subset consisting of 22 datasets and propose it as the MESS benchmark.


A Multi-Domain Evaluation of Scaling in a General Episodic Memory

Derbinsky, Nate (University of Michigan) | Li, Justin (University of Michigan) | Laird, John (University of Michigan)

AAAI Conferences

Episodic memory endows agents with numerous general cognitive capabilities, such as action modeling and virtual sensing. However, for long-lived agents, there are numerous unexplored computational challenges in supporting useful episodic-memory functions while maintaining real-time reactivity. In this paper, we review the implementation of episodic memory in Soar and present an expansive evaluation of that system. We demonstrate useful applications of episodic memory across a variety of domains, including games, mobile robotics, planning, and linguistics. In these domains, we characterize properties of environments, tasks, and episodic cues that affect performance, and evaluate the ability of Soar’s episodic memory to support hours to days of real-time operation.